Welcome and Chapter 1

Tyler George

Cornell College
STA 363 Fall 2024 Block 1

Welcome!

Instructor

Course logistics

  • Course Dates: August 26th to September 18th
  • Course sessions: M-F,9am-11am and 1pm-3pm
  • Exam Dates: September 6th and 18th

Generalized Linear Models

In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

Logistic regression

\[\begin{aligned}\pi = P(y = 1 | x) \hspace{2mm} &\Rightarrow \hspace{2mm} \text{Link function: } \log\big(\frac{\pi}{1-\pi}\big) \\ &\Rightarrow \log\big(\frac{\pi}{1-\pi}\big) = \beta_0 + \beta_1~x\end{aligned}\]

What we’re covering this semester(1/3)

Generalized Linear Models (Ch 1 - 6)

  • Introduce models for non-normal response variables
  • Estimation, interpretation, and inference
  • Mathematical details showing how GLMs are connected

What we’re covering this semester(2/3)

Modeling correlated data (Ch 7 - 9)

  • Introduce multilevel models for correlated and longitudinal data
  • Estimation, interpretation, and inference
  • Mathematical details, particularly diving into covariance structures

What we’re covering this semester(3/3)

More Regression Models (ITSL Chapter 7) - Polynomial Regression - Regression Splines - Smoothing Splines - Generalized Additive Models (GAMS)

Meet your classmates!

  • Create larger groups
  • Quick introductions - Name, year, and major
  • Choose a reporter
    • Need help choosing? Person with birthday closest to December 1st.
  • Identify 8 things everyone in the group has in common
    • Not being a Cornell Student
    • Not clothes (we’re all wearing socks)
    • Not body parts (we all have a nose)

Reporter will share list with the class

What background is assumed for the course?

Pre-reqs

  • STA 201, 202 and DSC 223

Background knowledge

  • Statistical content
    • Linear and logistic regression
    • Statistical inference
    • Basic understanding of random variables
  • Computing
    • Using R for data analysis
    • Writing reports using R Markdown or Quarto

Course Toolkit (1/2)

Course Toolkit (1/2)

Class Meetings

Lectures

  • Some traditional lecture
  • Individual and group labs
  • Bring fully-charged laptop
  • Mini-projects
  • Exams

Attendance is expected (if you are healthy!)

Textbook

Beyond Multiple Linear Regression by Paul Roback and Julie Legler

  • Available online
  • Hard copies available for purchase

Textbook 2

The secondary text is: An Introduction to Statistical Learning with Applications in R by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani – it is freely available online. Chapter 7.

  • Hard copies available for purchase

Using R / RStudio

Activities & Assessments

Readings

  • Primarily from Beyond Multiple Linear Regression
  • Recommend reading assigned text before lecture

Homework - Primarily from Beyond Multiple Linear Regression - Individual assignments - Work together but must complete your own work. Discuss but don’t copy.

Activities & Assessments

Mini-projects

Examples:

  • Mini-project 01: Focused on models for non-normal response variables, such as count data
  • Mini-project 02: Focused on models for correlated data
  • Short write up and short presentation
  • Team-based

Exams

  • Two exams this block, September 6th and 18th.

  • Each will have two components

    • Component 1 will be on these dates and you will get a choice of oral or written format.
    • Component 2 will be a take-home, open-book, open-note, exam.
    • You will have 12 hours or more to complete this component.

Grading

Final grades will be calculated as follows

Category Points
Homework 200
Participation 100
Labs and Mini Projects 300
Exams 400
Total 1000

See Syllabus on website for letter grade thresholds.

Resources

Chapter 1